Introduction

There has been much investigation over whether or not futures curves can accurately predict spot prices. In general, experts conclude that due to a varying number of factors, futures curves are generally not a good forecasting tool. The reasons are many, but the most relevant for Brent are as follows:

  1. The effect of John Maynard Keynes’ ‘normal backwardation’ theory — which suggests forward prices will always tend to be discounted to the market’s expected future spot price to give investors an incentive to take on risk from producer hedgers. Therefore, even if the price is estimated correctly, the traded price will tend to under-state the market’s real price forecast.

  2. Risk-adjusted premium, which is added in by traders to reflect the added risk of trading futures at any given time. This premium is a complex calculation, often subjective, but involving such factors as consumer sentiment, interest rates, and general economic health.

  3. The curve fails to account for the ‘real’ inflation-adjusted value.

Source: Financial Times

However, in our past analysis, we discovered that compared to the EIA and internal IHS forecasts, futures curves outperform both consistently, up to the 12-18 month horizon from which we performed our analysis. In this analysis, we will dive deeper into the performance characteristics of Brent forward curves and examine accuracies on different forecast horizons and on differing points of the curve. This analysis is meant to help us gain a basic understanding of how the accuracy of futures curves may fluctuate over time, and sets us up nicely to analyze the curves on a risk-premium adjusted basis, which should theoretically yield higher accuracies.

A First Graph

It may be fruitful to preface our analysis with a nice visualization generated by Timera Energy - the progression of oil futures throughout the past decade. From this visualization, we can get a high level idea of the general trend of futures curves and their movement tendencies.

Progression of futures curves throughout the past decade

Data Import and Cleaning

We start with the easy part, importing and cleaning the oil prices on a monthly and daily frequency. We receive monthly oil price from IHS and daily from Quandl.

#Part II: import monthly prices

oilPrice <- as.data.frame(AACloudTools::SqlToDf("SELECT * FROM eaa_prod.eaa_data WHERE name = 'SPBRENTaUK.M'"))

oilPrice$date <- substr(oilPrice$date, 1, 7)

oilPrice$name <- NULL

colnames(oilPrice) <- c('enddate', 'actualatenddate')

head(oilPrice)
##   enddate actualatenddate
## 1 1996-07        19.59700
## 2 1990-07        17.25000
## 3 2016-08        46.82043
## 4 2012-07       102.25667
## 5 2011-07       116.54850
## 6 2013-09       111.56550
#import daily prices 

oilPriceDaily <- Quandl("EIA/PET_RBRTE_D")

head(oilPriceDaily, 20)
##          Date Value
## 1  2017-06-26 44.09
## 2  2017-06-23 44.14
## 3  2017-06-22 44.46
## 4  2017-06-21 44.62
## 5  2017-06-20 43.98
## 6  2017-06-19 45.93
## 7  2017-06-16 45.70
## 8  2017-06-15 45.61
## 9  2017-06-14 45.47
## 10 2017-06-13 46.95
## 11 2017-06-12 47.18
## 12 2017-06-09 46.64
## 13 2017-06-08 46.30
## 14 2017-06-07 47.08
## 15 2017-06-06 48.11
## 16 2017-06-05 48.25
## 17 2017-06-02 48.46
## 18 2017-06-01 50.41
## 19 2017-05-31 49.40
## 20 2017-05-30 50.65
#fill in missing days with previous known price

allDays <- seq(min(oilPriceDaily$Date), max(oilPriceDaily$Date), 1)

allDays <- as.data.frame(allDays[!(allDays %in% oilPriceDaily$Date)])

oilPriceDaily <- rbind.fill(oilPriceDaily, allDays)

for(i in 1:nrow(oilPriceDaily)) {
  
if(is.na(oilPriceDaily$Date[i]) == TRUE) {
  
  oilPriceDaily$Date[i] <- oilPriceDaily$`allDays[!(allDays %in% oilPriceDaily$Date)]`[i]
        
}
}

oilPriceDaily <- oilPriceDaily[order(oilPriceDaily$Date),]

oilPriceDaily$Value <- na.locf(oilPriceDaily$Value)

oilPriceDaily$`allDays[!(allDays %in% oilPriceDaily$Date)]` <- NULL

head(oilPriceDaily)
##            Date Value
## 7646 1987-05-20 18.63
## 7645 1987-05-21 18.45
## 7644 1987-05-22 18.55
## 7647 1987-05-23 18.55
## 7648 1987-05-24 18.55
## 7643 1987-05-25 18.60

The next stages of our study will require us to split the analysis into two sections, monthly-aggregate and daily-level analysis. In the monthly-aggregate level analysis, we aggregate all futures curves for the month and take their average. The reason we aggregate for the month is because prices are on a monthly level, so it would benefit us to analyze the futures curves on a monthly level as well. The methodology we used to aggregate was as follows:

Take each vignette in a month, and for the same expiration date, average their expiration prices together.

Ex. For a vignette of July of 2010, expiration date August of 2010, we averaged together the expiration price of August of 2010 for each day in July of 2010. So, we took the mean of the expiration price for August 2010 for July 1st, 2nd, 3rd…31st. There is only one expiration price for each month.

Monthly-Aggregate Level Analysis

We first need to import and clean the Morningstar data.

#look at morningstar data

ForwardCurves <- read.csv('BrentCrude2000-2009.csv')

head(ForwardCurves)
##   INSTRUMENT_ID                 SYMBOL_DESCRIPTION EXPIRATION_DATE
## 1      1tBRNG00 Feb 00 ICE Brent Crude Oil Futures     14-Jan-2000
## 2      1tBRNH00 Mar 00 ICE Brent Crude Oil Futures     14-Feb-2000
## 3      1tBRNJ00 Apr 00 ICE Brent Crude Oil Futures     16-Mar-2000
## 4      1tBRNK00 May 00 ICE Brent Crude Oil Futures     13-Apr-2000
## 5      1tBRNM00 Jun 00 ICE Brent Crude Oil Futures     16-May-2000
## 6      1tBRNN00 Jul 00 ICE Brent Crude Oil Futures     15-Jun-2000
##   TRADE_DATETIME  OPEN  HIGH   LOW CLOSE VOLUME OPEN_INTEREST
## 1    04-Jan-2000 23.90 24.70 23.89 24.39  32509         55140
## 2    04-Jan-2000 23.15 23.90 23.15 23.69  14208         58121
## 3    04-Jan-2000 22.73 23.00 22.40 22.86   4303         26409
## 4    04-Jan-2000 21.95 22.21 21.74 22.07   1487         15367
## 5    04-Jan-2000 21.33 21.42 20.98 21.42    694         23273
## 6    04-Jan-2000 20.60 20.94 20.60 20.94     10         10352

As we can see there needs to be significant wrangling done to get the data into a usable format. In addition the data does not come aggregated, so we will need to aggregate the futures curve by month for the first stage for our analysis.

#clean date field

ForwardCurves$EXPIRATION_DATE <- dmy(ForwardCurves$EXPIRATION_DATE)

ForwardCurves$TRADE_DATETIME <- dmy(ForwardCurves$TRADE_DATETIME)

#aggregate by month

ForwardCurves$TRADE_YEARMON <- substr(ForwardCurves$TRADE_DATETIME, 1, 7)

ForwardCurves$EXPIRATION_YEARMON <- substr(ForwardCurves$EXPIRATION_DATE, 1, 7)

ForwardCurves$UniquePairing <- paste(ForwardCurves$EXPIRATION_YEARMON, ForwardCurves$TRADE_YEARMON)

ForwardCurvesAgg <- summaryBy(CLOSE ~ UniquePairing, data=ForwardCurves, FUN=mean)

ForwardCurvesAgg$valuationdate <- substr(ForwardCurvesAgg$UniquePairing, 9, 16)
ForwardCurvesAgg$enddate <- substr(ForwardCurvesAgg$UniquePairing, 1, 7)

ForwardCurvesAgg$UniquePairing <- NULL

ForwardCurvesAgg$actualatenddate <- vlookup(ForwardCurvesAgg$enddate, oilPrice, 2)

ForwardCurvesAgg <- dplyr::select(ForwardCurvesAgg, 2, 3, 1, 4)

colnames(ForwardCurvesAgg) <- c('valuationdate', 'enddate', 'price', 'actualatenddate')

morningstar <- ForwardCurvesAgg

head(morningstar)
##   valuationdate enddate    price actualatenddate
## 1       2000-01 2000-01 24.27111          25.514
## 2       2000-01 2000-02 24.98600          27.760
## 3       2000-02 2000-02 27.02800          27.760
## 4       2000-01 2000-03 24.20300          27.285
## 5       2000-02 2000-03 26.47667          27.285
## 6       2000-03 2000-03 29.00000          27.285

The morningstar data, detailing Brent forward prices prior to 2009, has now been adequately cleaned and readied for the next stage of our analysis. We now import the remainder of the futures data, from Totem

#import totem futures

futures <- AACloudTools::SqlToDf("select valuationdate, enddate, name, totemtype, price, compositeprice
from eaa_prod.totem
where
name in ('BRENT','BRENT (BULLET)') and
totemgroup = 'Crude Oil'
and period = 'Month'
                and totemtype in ('BulletSwap' , 'Swap')
order by valuationdate, enddate")

head(futures)
##   valuationdate    enddate  name totemtype price compositeprice
## 1    2009-09-22 2009-10-31 BRENT      Swap    NA       70.89818
## 2    2009-09-22 2009-11-30 BRENT      Swap    NA       71.59046
## 3    2009-09-22 2009-12-31 BRENT      Swap    NA       72.17333
## 4    2009-09-22 2010-01-31 BRENT      Swap    NA       72.83133
## 5    2009-09-22 2010-02-28 BRENT      Swap    NA       73.43000
## 6    2009-09-22 2010-03-31 BRENT      Swap    NA       73.97434

We see that this set needs significant cleaning and aggregation as well, as well as the determination of the difference between composite price and price. We need to determine if the two are interchangable, as there are many dates where only one or the other exist.

#first determine the validity of using compositeprice when there is no 'price' value

futures$diff <- (futures$price - futures$compositeprice)/futures$compositeprice
futures <- futures[complete.cases(futures), ]

#shows a mean .01% difference and 1.5% max difference over the entire time period
max(abs(futures$diff))
## [1] 0.01567301
mean(abs(futures$diff))
## [1] 0.0003022013
futures$valuationdate <- lubridate::as_date(futures$valuationdate)
futures$enddate <- as_date(futures$enddate)

#shows difference increasing over time, but always staying below 1.5%
ggplot() + 
  geom_line(data = futures, aes(x = enddate, y = abs(diff)*100, color = "diff")) + 
  ggtitle("Percent Difference between Composite Price and Price over Time") +
  xlab('Date') +
  ylab('% Difference')

We see that there is a very small difference (.01% on average) between compositeprice and price, with a max of 1.5%. We next investigate if there is a pattern to the differences with a heatmap.

wk <- function(x) as.numeric(format(x, "%U"))

futures$dayofweek <- weekdays(as.Date(futures$valuationdate))
futures$month <- lubridate::month(futures$valuationdate)
futures$monthweek <- wk(futures$valuationdate) - wk(as.Date(cut(futures$valuationdate, "month"))) + 1

for(i in 1:nrow(futures)) {
  
if(futures$monthweek[i] == 6) {
  futures$monthweek[i] <- 5
}
  
}

futures$year <- lubridate::year(futures$valuationdate)

ggplot(futures, aes(monthweek, dayofweek, fill = abs(diff * 100))) + 
  geom_tile(colour = "white") + 
  facet_grid(year~month) + 
  scale_fill_gradient(low="white", high="red") +
  labs(x="Week of Month",
       y="",
       title = "Time-Series Calendar Heatmap", 
       subtitle="Differences between Composite Price and Price, Percent", 
       fill="Percent Difference")

We see that differences tend to become more pronounced around the extrema of our date range. However, the percent difference never exceeds 1.5%, and for the vast majority of the time stayed underneath 0.4%. We determine that for the purposes of evaluating accuracy, this is an acceptable spread, as we’ll see later that this is orders of magnitude smaller than the average forecast error. Next we begin our cleaning.

#refresh and clean totem futures and aggregate by month

futures <- AACloudTools::SqlToDf("select valuationdate, enddate, name, totemtype, price, compositeprice
from eaa_prod.totem
where
name in ('BRENT','BRENT (BULLET)') and
totemgroup = 'Crude Oil'
and period = 'Month'
                and totemtype in ('BulletSwap' , 'Swap')
order by valuationdate, enddate")

futures$uniquepair <- paste(futures$valuationdate, futures$enddate, sep = ':')

futures$diffperc <- (futures$price - futures$compositeprice)/futures$compositeprice

futures$price[is.na(futures$price)] <- as.numeric(futures$compositeprice[is.na(futures$price)])

futures <- dplyr::select(futures, 1,2,5,7)

futures$valuation_yearmon <- substr(futures$valuationdate, 1, 7)
futures$expiration_yearmon <- substr(futures$enddate, 1, 7)

futures$uniquepair <- paste(futures$valuation_yearmon, futures$expiration_yearmon)

head(futures)
##   valuationdate    enddate    price      uniquepair valuation_yearmon
## 1    2009-09-22 2009-10-31 70.89818 2009-09 2009-10           2009-09
## 2    2009-09-22 2009-11-30 71.59046 2009-09 2009-11           2009-09
## 3    2009-09-22 2009-12-31 72.17333 2009-09 2009-12           2009-09
## 4    2009-09-22 2010-01-31 72.83133 2009-09 2010-01           2009-09
## 5    2009-09-22 2010-02-28 73.43000 2009-09 2010-02           2009-09
## 6    2009-09-22 2010-03-31 73.97434 2009-09 2010-03           2009-09
##   expiration_yearmon
## 1            2009-10
## 2            2009-11
## 3            2009-12
## 4            2010-01
## 5            2010-02
## 6            2010-03
futuresAgg <- summaryBy(price ~ uniquepair, data = futures, FUN = mean)

futuresAgg$valuationdate <- substr(futuresAgg$uniquepair, 1, 7)
futuresAgg$enddate <- substr(futuresAgg$uniquepair, 9, 16)

futuresAgg$uniquepair <- NULL

colnames(futuresAgg) <- c('price', 'valuationdate', 'enddate')

head(futuresAgg)
##      price valuationdate enddate
## 1 67.18578       2009-09 2009-10
## 2 67.91482       2009-09 2009-11
## 3 68.54009       2009-09 2009-12
## 4 69.26619       2009-09 2010-01
## 5 69.93223       2009-09 2010-02
## 6 70.54456       2009-09 2010-03
#join morningstar, totem futures, and actual prices

futuresAllMonthly <- rbind.fill(morningstar, futuresAgg)

futuresAllMonthly$actualatenddate <- vlookup(ref = futuresAllMonthly$enddate, oilPrice, 2)

head(futuresAllMonthly)
##   valuationdate enddate    price actualatenddate
## 1       2000-01 2000-01 24.27111          25.514
## 2       2000-01 2000-02 24.98600          27.760
## 3       2000-02 2000-02 27.02800          27.760
## 4       2000-01 2000-03 24.20300          27.285
## 5       2000-02 2000-03 26.47667          27.285
## 6       2000-03 2000-03 29.00000          27.285

After the data cleaning we are finally able to begin our data analysis. We first split the data into different time horizons to see if the mean and standard deviation of accuracy changes over time.

#analyze accuracy at different horizons

futuresAllMonthly$valuationdate <- paste(futuresAllMonthly$valuationdate, '01', sep = '-')
futuresAllMonthly$enddate <- paste(futuresAllMonthly$enddate, '01', sep = '-')

futuresAllMonthly$valuationdate <- lubridate::as_date(futuresAllMonthly$valuationdate)
futuresAllMonthly$enddate <- lubridate::as_date(futuresAllMonthly$enddate)

futuresAllMonthly$daysApart <- futuresAllMonthly$enddate - futuresAllMonthly$valuationdate

futuresAllMonthly$diff <- (futuresAllMonthly$price - futuresAllMonthly$actualatenddate)/(futuresAllMonthly$actualatenddate)

#3 month horizon mean diff
futuresAllMonthly90 <- futuresAllMonthly[futuresAllMonthly$daysApart <= 90,]

mean(futuresAllMonthly90$diff[!is.na(abs(futuresAllMonthly90$diff))])
## [1] 0.003146378
#6 month horizon mean diff
futuresAllMonthly180 <- futuresAllMonthly[futuresAllMonthly$daysApart <= 180,]

mean(futuresAllMonthly180$diff[!is.na(abs(futuresAllMonthly180$diff))])
## [1] 0.01411623
#12 month horizon mean diff
futuresAllMonthly365 <- futuresAllMonthly[futuresAllMonthly$daysApart <= 365,]

mean(futuresAllMonthly365$diff[!is.na(abs(futuresAllMonthly365$diff))])
## [1] 0.02061838
#24 month horizon mean diff
futuresAllMonthly730 <- futuresAllMonthly[futuresAllMonthly$daysApart <= 730,]

mean(futuresAllMonthly730$diff[!is.na(abs(futuresAllMonthly730$diff))])
## [1] 0.04513897

We see that both the mean and standard deviation rise when we expand our evaluation of the horizon of forecast. We next look at error over time.

futuresAllMonthly$dayBucket <- round_any(as.numeric(futuresAllMonthly$daysApart), 50)

#demonstrate that as expiration dates get further into the future, error increases
ggplot(futuresAllMonthly, aes(x=dayBucket, y=abs(diff * 100))) + stat_summary(fun.y="mean", geom="bar") + ggtitle('Days Apart vs. Error') + xlab('Days Apart (Expiration Date - Vignette)') + ylab('Percent Difference')
## Warning: Removed 7208 rows containing non-finite values (stat_summary).

We see that the pattern of error confirms our initial claim that mean and standard deviation of error rise over time.

#demonstrate relative peaks in error (2009 and 2016 as expected)
ggplot(futuresAllMonthly, aes(x=enddate, y=abs(diff * 100))) + stat_summary(fun.y="mean", geom="bar") + ggtitle('Average Error by Date') + ylab('Percent Difference')
## Warning: Removed 7208 rows containing non-finite values (stat_summary).

In addition and as expected, error is maximized over turbulent times in the oil price market (2009 and 2016).

#demonstrate over and underestimations
ggplot(futuresAllMonthly, aes(x=enddate, y=diff * 100)) + stat_summary(fun.y="mean", geom="bar") + ggtitle('Average Error by Date') + ylab('Percent Difference')
## Warning: Removed 7208 rows containing non-finite values (stat_summary).

Precisely, as shown above, it is overestimations that occur during the 2009 and 2016 period, and underestimations in the 2008 period as oil prices become inflated faster than market expectations.

End of Month Level Analysis

In this next section, we only look at futures from the perspective of the last day of the month. The idea behind this is on the last day of the month, the futures curve has taken in the most information about the future and therefore should be the most accurate.

#take last day of month of totem futures

lastDays <- seq(as.Date("2001-01-01"), length=1200, by="1 month") - 1
lastDaysTwo <- seq(as.Date("2001-01-01"), length=1200, by="1 month") - 2
lastDaysThree <- seq(as.Date("2001-01-01"), length=1200, by="1 month") - 3

lastDays <- as.data.frame(lastDays)
lastDays$one <- 1

lastDaysTwo <- as.data.frame(lastDaysTwo)
lastDaysTwo$one <- 1

lastDaysThree <- as.data.frame(lastDaysThree)
lastDaysThree$one <- 1

futures$valuationdate <- as_date(futures$valuationdate)
futures$enddate <- as_date(futures$enddate)

futures$lastday <- vlookup(futures$valuationdate, lastDays, 2)
futures$lastdaytwo <- vlookup(futures$valuationdate, lastDaysTwo, 2)
futures$lastdaythree <- vlookup(futures$valuationdate, lastDaysThree, 2)

futuresDaily <- futures

futuresDaily[is.na(futuresDaily)] <- 0 
futuresDaily$EOM <- futuresDaily$lastday + futuresDaily$lastdaytwo + futuresDaily$lastdaythree

futuresDaily <- futuresDaily[futuresDaily$EOM > 0,]

head(futuresDaily)
##     valuationdate    enddate    price      uniquepair valuation_yearmon
## 334    2009-09-28 2009-10-31 65.95561 2009-09 2009-10           2009-09
## 335    2009-09-28 2009-11-30 66.69022 2009-09 2009-11           2009-09
## 336    2009-09-28 2009-12-31 67.32417 2009-09 2009-12           2009-09
## 337    2009-09-28 2010-01-31 68.06667 2009-09 2010-01           2009-09
## 338    2009-09-28 2010-02-28 68.74767 2009-09 2010-02           2009-09
## 339    2009-09-28 2010-03-31 69.37906 2009-09 2010-03           2009-09
##     expiration_yearmon lastday lastdaytwo lastdaythree EOM
## 334            2009-10       0          0            1   1
## 335            2009-11       0          0            1   1
## 336            2009-12       0          0            1   1
## 337            2010-01       0          0            1   1
## 338            2010-02       0          0            1   1
## 339            2010-03       0          0            1   1
uniqueLastDays <- as.data.frame(unique(futuresDaily$valuationdate))
uniqueLastDays$year <- lubridate::year(uniqueLastDays$`unique(futuresDaily$valuationdate)`)
uniqueLastDays$month <- lubridate::month(uniqueLastDays$`unique(futuresDaily$valuationdate)`)
uniqueLastDays$index <- paste(uniqueLastDays$year, uniqueLastDays$month)

uniqueLastDays$original <- uniqueLastDays$`unique(futuresDaily$valuationdate)`

uniqueLastDays <- arrange(uniqueLastDays, desc(original))

uniqueLastDays <- uniqueLastDays[!duplicated(uniqueLastDays$index),]

futuresDaily <- futures[futures$valuationdate %in% uniqueLastDays$original,]

futuresDaily <- dplyr::select(futuresDaily, 1:3)

futuresDaily$actualatenddate <- vlookup(ref = futuresDaily$enddate, oilPriceDaily, 2)

head(futuresDaily)
##     valuationdate    enddate    price actualatenddate
## 556    2009-09-30 2009-10-31 69.62939           74.91
## 557    2009-09-30 2009-11-30 70.34107           77.77
## 558    2009-09-30 2009-12-31 70.96136           77.91
## 559    2009-09-30 2010-01-31 71.68383           71.20
## 560    2009-09-30 2010-02-28 72.34067           76.36
## 561    2009-09-30 2010-03-31 72.93066           80.37
#join with morningstar futures

ForwardCurvesDaily <- read.csv('forwardcurvessmall2.csv')

ForwardCurvesDaily$X <- NULL

colnames(ForwardCurvesDaily) <- c('enddate', 'valuationdate', 'price')

ForwardCurvesDaily <- unfactor(ForwardCurvesDaily)
ForwardCurvesDaily$enddate <- as_date(ForwardCurvesDaily$enddate)
ForwardCurvesDaily$valuationdate <- as_date(ForwardCurvesDaily$valuationdate)

futuresAllDaily <- rbind.fill(futuresDaily, ForwardCurvesDaily)
futuresAllDaily$enddate <- as_date(futuresAllDaily$enddate)
futuresAllDaily$valuationdate <- as_date(futuresAllDaily$valuationdate)

futuresAllDaily$actualatenddate <- vlookup(ref = futuresAllDaily$enddate, table = oilPriceDaily, column = 2)

futuresAllDaily$daysApart <- futuresAllDaily$enddate - futuresAllDaily$valuationdate
futuresAllDaily$diff <- (futuresAllDaily$price - futuresAllDaily$actualatenddate)/(futuresAllDaily$actualatenddate)

head(futuresAllDaily)
##   valuationdate    enddate    price actualatenddate daysApart         diff
## 1    2009-09-30 2009-10-31 69.62939           74.91   31 days -0.070492766
## 2    2009-09-30 2009-11-30 70.34107           77.77   61 days -0.095524340
## 3    2009-09-30 2009-12-31 70.96136           77.91   92 days -0.089187977
## 4    2009-09-30 2010-01-31 71.68383           71.20  123 days  0.006795369
## 5    2009-09-30 2010-02-28 72.34067           76.36  151 days -0.052636620
## 6    2009-09-30 2010-03-31 72.93066           80.37  182 days -0.092563593

We also investigate how the mean and standard deviation moves over time and compare this to the monthly-aggregate futures.

#3 month horizon mean diff
futuresAllDaily90 <- futuresAllDaily[futuresAllDaily$daysApart <= 90,]

mean(futuresAllDaily90$diff[!is.na(abs(futuresAllDaily90$diff))])
## [1] 0.02312666
#6 month horizon mean diff
futuresAllDaily180 <- futuresAllDaily[futuresAllDaily$daysApart <= 180,]

mean(futuresAllDaily180$diff[!is.na(abs(futuresAllDaily180$diff))])
## [1] 0.04163877
#12 month horizon mean diff
futuresAllDaily365 <- futuresAllDaily[futuresAllDaily$daysApart <= 365,]

mean(futuresAllDaily365$diff[!is.na(abs(futuresAllDaily365$diff))])
## [1] 0.0783109
#24 month horizon mean diff
futuresAllDaily730 <- futuresAllDaily[futuresAllDaily$daysApart <= 730,]

mean(futuresAllDaily730$diff[!is.na(abs(futuresAllDaily730$diff))])
## [1] 0.1475766

Unexpectedly, when we take futures from only the last day of the month, we achieve less accuracy than when we take the average of the month. This could be because the effects of increased market knowledge over the course of the month are dampened quickly over time. Instead, the “wisdom of the crowds” may be a greater force at play here, with the aggregate of data over more periods of time ultimately increasing our forecast accuracy.

Comparing Monthly-Aggregate vs. End of Month Futures

We next compare the two types of looking at futures to see how their errors progress over time. It is clear that end of month futures consistently have more error than monthly mean futures, across all time horizons.

#prepare data for graphing

futuresAllDaily$dayBucket <- round_any(as.numeric(futuresAllDaily$daysApart), 50)

#join by DayBucket

futuresAllDailyDayBucket <- dplyr::select(futuresAllDaily, 6:7)

futuresAllMonthlyDayBucket <- dplyr::select(futuresAllMonthly, 6:7)

futuresAllDailyDayBucket$diffAbs <- abs(futuresAllDailyDayBucket$diff)

futuresAllMonthlyDayBucket$diffAbs <- abs(futuresAllMonthlyDayBucket$diff)

futuresAllDailyDayBucket <- stats::aggregate(futuresAllDailyDayBucket, by=list(unique.values = futuresAllDailyDayBucket$dayBucket), 
                                                FUN=mean, na.rm=TRUE)

futuresAllMonthlyDayBucket <- stats::aggregate(futuresAllMonthlyDayBucket, by=list(unique.values = futuresAllMonthlyDayBucket$dayBucket), 
                                             FUN=mean, na.rm=TRUE)

colnames(futuresAllDailyDayBucket) <- c('unique.values', 'daily.diff', 'daybucket', 'daily.diff.abs')
colnames(futuresAllMonthlyDayBucket) <- c('unique.values', 'monthly.diff', 'daybucket', 'monthly.diff.abs')

futuresAllDayBucket <- futuresAllDailyDayBucket
futuresAllDayBucket$monthly.diff <- vlookup(futuresAllDayBucket$unique.values, futuresAllMonthlyDayBucket, 2)
futuresAllDayBucket$monthly.diff.abs <- vlookup(futuresAllDayBucket$unique.values, futuresAllMonthlyDayBucket, 4)

Daily Differences vs. Monthly-Aggregate Differences, Raw

plot_ly(futuresAllDayBucket, x = ~daybucket, y = ~daily.diff * 100, type = 'bar', name = 'Daily Differences') %>%
  add_trace(y = ~monthly.diff * 100, name = 'Monthly-Aggregate Differences') %>%
  layout(xaxis = list(title = 'Days from Vignette'), yaxis = list(title = 'Difference (%)'), barmode = 'group') 
## Warning: Ignoring 26 observations

## Warning: Ignoring 26 observations
0100020003000020406080100120140
Days from VignetteDifference (%)Daily DifferencesMonthly-Aggregate Differences

Daily Differences vs. Monthly-Aggregate Differences, Absolute Difference

plot_ly(futuresAllDayBucket, x = ~daybucket, y = ~daily.diff.abs * 100, type = 'bar', name = 'Daily Differences') %>%
  add_trace(y = ~monthly.diff.abs * 100, name = 'Monthly-Aggregate Differences') %>%
  layout(xaxis = list(title = 'Days from Vignette'), yaxis = list(title = 'Difference (%)'), barmode = 'group')
## Warning: Ignoring 26 observations

## Warning: Ignoring 26 observations
0100020003000020406080100120140
Days from VignetteDifference (%)Daily DifferencesMonthly-Aggregate Differences

Conclusion

Futures are a decent benchmark for forecasting Brent prices. We have seen that over time, they of course get worse at their predictions, but over a short time horizon (3 months), they are actually remarkably accurate, with a .03% error rate on average across all time periods. This increases substantially as we extend our evaluation horizon, from 1.4% error over 6 months, 2% error over 12 months, and 4.5% error over 24 months. These accuracies need to be evaluated from a business context on a case by case basis, as 4.5% may be unacceptable when implemented in a trading platform, but still of practical value when considering capital investment decisions.